1,340 research outputs found
MAGAN: Margin Adaptation for Generative Adversarial Networks
We propose the Margin Adaptation for Generative Adversarial Networks (MAGANs)
algorithm, a novel training procedure for GANs to improve stability and
performance by using an adaptive hinge loss function. We estimate the
appropriate hinge loss margin with the expected energy of the target
distribution, and derive principled criteria for when to update the margin. We
prove that our method converges to its global optimum under certain
assumptions. Evaluated on the task of unsupervised image generation, the
proposed training procedure is simple yet robust on a diverse set of data, and
achieves qualitative and quantitative improvements compared to the
state-of-the-art
Unsupervised Hyperbolic Representation Learning via Message Passing Auto-Encoders
Most of the existing literature regarding hyperbolic embedding concentrate
upon supervised learning, whereas the use of unsupervised hyperbolic embedding
is less well explored. In this paper, we analyze how unsupervised tasks can
benefit from learned representations in hyperbolic space. To explore how well
the hierarchical structure of unlabeled data can be represented in hyperbolic
spaces, we design a novel hyperbolic message passing auto-encoder whose overall
auto-encoding is performed in hyperbolic space. The proposed model conducts
auto-encoding the networks via fully utilizing hyperbolic geometry in message
passing. Through extensive quantitative and qualitative analyses, we validate
the properties and benefits of the unsupervised hyperbolic representations.
Codes are available at https://github.com/junhocho/HGCAE
Migration of Elastic Capsules by an Optical Force in a Uniform flow
AbstractThe behavior of an elastic capsule by an optical force in a uniform flow is examined by using the penalty immersed boundary method. The elastic capsule is subjected to the laser beam with Gaussian distribution in the perpendicular direction to the fluid flow. The elastic capsule migrated by the optical force along the direction of the laser beam propagation, and the migration distance is dependent on its properties. The oblate capsule with b/a = 0.5 obeying the neo-Hookean constitutive law is first considered, and the effects of the surface Young's modulus and the initial inclination angle on the migration distance are studied. The migration distance of the oblate capsule is increased as the surface Young's modulus increases, and the non-inclined oblate capsule is more migrated than the differently inclined capsules. Then the spherical, oblate, and biconcave capsules obeying the Skalak constitutive law are considered. A comparison of the trajectories of the capsules indicates that the migration of the spherical capsule is the largest. Unlike the oblate capsule, the non-inclined biconcave capsule is less migrated than other inclination angles due to its initial shape
Contrastive Vicinal Space for Unsupervised Domain Adaptation
Recent unsupervised domain adaptation methods have utilized vicinal space
between the source and target domains. However, the equilibrium collapse of
labels, a problem where the source labels are dominant over the target labels
in the predictions of vicinal instances, has never been addressed. In this
paper, we propose an instance-wise minimax strategy that minimizes the entropy
of high uncertainty instances in the vicinal space to tackle the stated
problem. We divide the vicinal space into two subspaces through the solution of
the minimax problem: contrastive space and consensus space. In the contrastive
space, inter-domain discrepancy is mitigated by constraining instances to have
contrastive views and labels, and the consensus space reduces the confusion
between intra-domain categories. The effectiveness of our method is
demonstrated on public benchmarks, including Office-31, Office-Home, and
VisDA-C, achieving state-of-the-art performances. We further show that our
method outperforms the current state-of-the-art methods on PACS, which
indicates that our instance-wise approach works well for multi-source domain
adaptation as well. Code is available at https://github.com/NaJaeMin92/CoVi.Comment: 10 pages, 7 figures, 5 table
High-Fidelity Eye Animatable Neural Radiance Fields for Human Face
Face rendering using neural radiance fields (NeRF) is a rapidly developing
research area in computer vision. While recent methods primarily focus on
controlling facial attributes such as identity and expression, they often
overlook the crucial aspect of modeling eyeball rotation, which holds
importance for various downstream tasks. In this paper, we aim to learn a face
NeRF model that is sensitive to eye movements from multi-view images. We
address two key challenges in eye-aware face NeRF learning: how to effectively
capture eyeball rotation for training and how to construct a manifold for
representing eyeball rotation. To accomplish this, we first fit FLAME, a
well-established parametric face model, to the multi-view images considering
multi-view consistency. Subsequently, we introduce a new Dynamic Eye-aware NeRF
(DeNeRF). DeNeRF transforms 3D points from different views into a canonical
space to learn a unified face NeRF model. We design an eye deformation field
for the transformation, including rigid transformation, e.g., eyeball rotation,
and non-rigid transformation. Through experiments conducted on the ETH-XGaze
dataset, we demonstrate that our model is capable of generating high-fidelity
images with accurate eyeball rotation and non-rigid periocular deformation,
even under novel viewing angles. Furthermore, we show that utilizing the
rendered images can effectively enhance gaze estimation performance.Comment: Under revie
Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video
We propose a self-supervised visual learning method by predicting the
variable playback speeds of a video. Without semantic labels, we learn the
spatio-temporal visual representation of the video by leveraging the variations
in the visual appearance according to different playback speeds under the
assumption of temporal coherence. To learn the spatio-temporal visual
variations in the entire video, we have not only predicted a single playback
speed but also generated clips of various playback speeds and directions with
randomized starting points. Hence the visual representation can be successfully
learned from the meta information (playback speeds and directions) of the
video. We also propose a new layer dependable temporal group normalization
method that can be applied to 3D convolutional networks to improve the
representation learning performance where we divide the temporal features into
several groups and normalize each one using the different corresponding
parameters. We validate the effectiveness of our method by fine-tuning it to
the action recognition and video retrieval tasks on UCF-101 and HMDB-51.Comment: Accepted by IEEE Access on May 19, 202
- …